17,705 results on '"Zhang, Shuai"'
Search Results
2. Volume Calculation and Error Analysis of the Working Space of the Manipulator Based on the Principle of Calculus
- Author
-
Wang, Chuanjiang, primary, Song, Jian, additional, Zhang, Shuai, additional, Yang, Sen, additional, and Sun, Xiujuan, additional
- Published
- 2024
- Full Text
- View/download PDF
3. YOLO-OCR: End-to-end Compound Figure Separation and Label Recognition of Images in Scientific Publications
- Author
-
Meng, Shuo, primary, Liang, Xinshuo, additional, Zhang, Shuai, additional, Lei, Leqi, additional, Wu, Hanbai, additional, IQBAL, Saira, additional, and Hu, Jinlian, additional
- Published
- 2024
- Full Text
- View/download PDF
4. Analysis of Distributed Cross-Domain Anti-sea Combat Command and Decision-Making Capability Enhancement
- Author
-
Zhao, Xinye, primary, Kou, Zhu, additional, Zhang, Shuai, additional, and Liu, Peng, additional
- Published
- 2024
- Full Text
- View/download PDF
5. Multi-scale Transformer with Decoder for Image Quality Assessment
- Author
-
Zhang, Shuai, primary and Liu, Yutao, additional
- Published
- 2024
- Full Text
- View/download PDF
6. Can large language models understand uncommon meanings of common words?
- Author
-
Wu, Jinyang, Che, Feihu, Zheng, Xinxin, Zhang, Shuai, Jin, Ruihan, Nie, Shuai, Shao, Pengpeng, and Tao, Jianhua
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs.
- Published
- 2024
7. Detecting the spread of valence band Wannier functions by optical sum rules
- Author
-
Cárdenas-Castillo, Luis F., Zhang, Shuai, Kochan, Denis, Freire Jr., Fernando L., and Chen, Wei
- Subjects
Condensed Matter - Materials Science - Abstract
The spread of valence band Wannier functions in semiconductors and insulators is a characteristic property that gives a rough estimation of how insulating is the material. We elaborate that the gauge-invariant part of the spread can be extracted experimentally from optical conductivity and absorbance, owing to their equivalence to the quantum metric of the valence band states integrated over momentum. Because the quantum metric enters the matrix element of optical conductivity, the spread of valence band Wannier functions in the gapped 3D materials can be obtained from the frequency-integration of the imaginary part of the dielectric function. We demonstrate this practically for typical semiconductors like Si and Ge, and for topological insulators like Bi$_{2}$Te$_{3}$. In 2D materials, the spread of Wannier functions in the valence bands can be obtained from the absorbance divided by frequency and then integrated over frequency. Applying this method to graphene reveals a finite spread caused by intrinsic spin-orbit coupling, which may be detected by absorbance in the microwave range. The absorbance of twisted bilayer graphene in the millimeter wave range can be used to detect the formation of the flat bands and quantify their quantum metric. Finally, we apply our method to hexagonal transition metal dichalcogenides MX$_{2}$ (M = Mo, W; X = S, Se, Te) and demonstrate how other effects like substrate, excitons, and higher energy bands can affect the spread of Wannier function., Comment: 13 pages, 7 figures
- Published
- 2024
8. Volume growth and positive scalar curvature
- Author
-
Wei, Guodong, Xu, Guoyi, and Zhang, Shuai
- Subjects
Mathematics - Differential Geometry - Abstract
For three dimensional complete, non-compact Riemannian manifolds with non-negative Ricci curvature and uniformly positive scalar curvature, we obtain the sharp linear volume growth ratio and the corresponding rigidity., Comment: submitted to some journal
- Published
- 2024
9. FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data
- Author
-
Jing, Shusen, Yu, Anlan, Zhang, Shuai, and Zhang, Songyang
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Recent efforts have been made to integrate self-supervised learning (SSL) with the framework of federated learning (FL). One unique challenge of federated self-supervised learning (FedSSL) is that the global objective of FedSSL usually does not equal the weighted sum of local SSL objectives. Consequently, conventional approaches, such as federated averaging (FedAvg), fail to precisely minimize the FedSSL global objective, often resulting in suboptimal performance, especially when data is non-i.i.d.. To fill this gap, we propose a provable FedSSL algorithm, named FedSC, based on the spectral contrastive objective. In FedSC, clients share correlation matrices of data representations in addition to model weights periodically, which enables inter-client contrast of data samples in addition to intra-client contrast and contraction, resulting in improved quality of data representations. Differential privacy (DP) protection is deployed to control the additional privacy leakage on local datasets when correlation matrices are shared. We also provide theoretical analysis on the convergence and extra privacy leakage. The experimental results validate the effectiveness of our proposed algorithm.
- Published
- 2024
10. CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
- Author
-
Chen, Pei, Han, Boran, and Zhang, Shuai
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) have shown great ability in solving traditional natural language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. In particular, we discover that applying different reasoning paths for different roles is an effective strategy to implement few-shot prompting approaches in the multi-agent scenarios. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems over competitive baselines. Our further analysis shows the necessity of prompting LLMs to play different roles or experts independently. We release the code at: https://github.com/amazon-science/comm-prompt, Comment: Accepted to NAACL 2024
- Published
- 2024
11. KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering
- Author
-
Zheng, Xinxin, Che, Feihu, Wu, Jinyang, Zhang, Shuai, Nie, Shuai, Liu, Kang, and Tao, Jianhua
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise information and impair the performance of large language models. To tackle this problem, we propose a novel Knowledge Selection of Large Language Models (KS-LLM) method, aiming to identify valuable information from evidence documents. The KS-LLM approach utilizes triples to effectively select knowledge snippets from evidence documents that are beneficial to answering questions. Specifically, we first generate triples based on the input question, then select the evidence sentences most similar to triples from the evidence document, and finally combine the evidence sentences and triples to assist large language models in generating answers. Experimental comparisons on several question answering datasets, such as TriviaQA, WebQ, and NQ, demonstrate that the proposed method surpasses the baselines and achieves the best results.
- Published
- 2024
12. Feasibility Study of Function Splits in RAN Architectures with LEO Satellites
- Author
-
Seeram, Siva Satya Sri Ganesh, Feltrin, Luca, Ozger, Mustafa, Zhang, Shuai, and Cavdar, Cicek
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
This paper explores the evolution of Radio Access Network (RAN) architectures and their integration into Non-Terrestrial Networks (NTN) to address escalating mobile traffic demands. Focusing on Low Earth Orbit (LEO) satellites as key components of NTN, we examine the feasibility of RAN function splits (FSs) in terms of fronthaul (FH) latency, elevation angle, and bandwidth (BW) across LEO satellites and ground stations (GS), alongside evaluating performance of Conditional Handover (CHO) procedures under diverse scenarios. By assessing performance metrics such as handover duration, disconnection time, and control traffic volume, we provide insights on several aspects such as stringent constraints for Low Layer Splits (LLSs), leading to longer delays during mobility procedures and increased control traffic across the feeder link in comparison with the case when gNodeB is onboard satellite. Despite challenges, LLSs demonstrate minimal onboard satellite computational requirements, promising reduced power consumption and payload weight. These findings underscore the architectural possibilities and challenges within the telecommunications industry, paving the way for future advancements in NTN RAN design and operation., Comment: 6 pages, 6 figures, EuCNC conference 2024
- Published
- 2024
13. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction
- Author
-
Bonilla, Sierra, Zhang, Shuai, Psychogyios, Dimitrios, Stoyanov, Danail, Vasconcelos, Francisco, and Bano, Sophia
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer., Comment: 12 pages, 5 figures
- Published
- 2024
14. Nonlinear Hall effect and scaling law in Sb-doped topological insulator MnBi4Te7
- Author
-
Wang, Shaoyu, Li, Xiubing, Zhang, Heng, Chen, Bo, Xie, Hangkai, Li, Congcong, Fei, Fucong, Zhang, Shuai, and Song, Fengqi
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Nonlinear Hall effect (NLHE), as a new member of Hall effect family, has been realized in many materials, attracting a great deal of attention. Here, we report the observation of NLHE in magnetic topological insulator Sb-doped MnBi4Te7 flakes. The NLHE generation efficiency can reach up to 0.06 V^-1, which is comparable to that observed in MnBi2Te4. Differently, the NLHE can survive up to 200 K, much larger than the magnetic transition temperature. We further study the scaling behavior of the NLHE with longitudinal conductivity. The linear relationship with opposite slope when temperature is below and above the magnetic transition temperature is uncovered. It reveals that the NLHE originates from skew scattering. Our work provides a platform to search NLHE with larger generation efficiency at higher temperatures.
- Published
- 2024
- Full Text
- View/download PDF
15. Bridging Remote Sensors with Multisensor Geospatial Foundation Models
- Author
-
Han, Boran, Zhang, Shuai, Shi, Xingjian, and Reichstein, Markus
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM is uniquely adept at handling both paired and unpaired sensor data. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach in masked image modeling, enabling the synthesis of joint representations from diverse sensors. msGFM, incorporating four remote sensors, upholds strong performance, forming a comprehensive model adaptable to various sensor types. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks. These include scene classification, segmentation, cloud removal, and pan-sharpening. A key discovery of our research is that representations derived from natural images are not always compatible with the distinct characteristics of geospatial remote sensors, underscoring the limitations of existing representations in this field. Our work can serve as a guide for developing multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities., Comment: Accepted to CVPR
- Published
- 2024
16. TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
- Author
-
Zhang, Shuai, Zhao, Huangxuan, Zhou, Zhenghong, Wu, Guanjun, Zheng, Chuansheng, Wang, Xinggang, and Liu, Wenyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code will be publicly available.
- Published
- 2024
17. Evolution of flat band and role of lattice relaxations in twisted bilayer graphene
- Author
-
Li, Qian, Zhang, Hongyun, Wang, Yijie, Chen, Wanying, Bao, Changhua, Liu, Qinxin, Lin, Tianyun, Zhang, Shuai, Zhang, Haoxiong, Watanabe, Kenji, Taniguchi, Takashi, Avila, Jose, Dudin, Pavel, Li, Qunyang, Yu, Pu, Duan, Wenhui, Song, Zhida, and Zhou, Shuyun
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Magic-angle twisted bilayer graphene (MATBG) exhibits correlated phenomena such as superconductivity and Mott insulating state related to the weakly dispersing flat band near the Fermi energy. Beyond its moir\'e period, such flat band is expected to be sensitive to lattice relaxations. Thus, clarifying the evolution of the electronic structure with twist angle is critical for understanding the physics of MATBG. Here, we combine nanospot angle-resolved photoemission spectroscopy and atomic force microscopy to resolve the fine electronic structure of the flat band and remote bands, and their evolution with twist angles from 1.07$^\circ$ to 2.60$^\circ$. Near the magic angle, dispersion is characterized by a flat band near the Fermi energy with a strongly reduced bandwidth. Moreover, near 1.07$^\circ$, we observe a spectral weight transfer between remote bands at higher binding energy and extract the modulated interlayer spacing near the magic angle. Our work provides direct spectroscopic information on flat band physics and highlights the role of lattice relaxations., Comment: 22 pages, 5 figures, Nature Materials, in press
- Published
- 2024
18. Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models
- Author
-
Qiu, Huachuan, Zhang, Shuai, He, Hongliang, Li, Anqi, and Lan, Zhenzhong
- Subjects
Computer Science - Computation and Language - Abstract
Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with large language models provides a practical and cost-efficient method for developing pornographic text detectors., Comment: Accepted to CSCWD 2024 (27th International Conference on Computer Supported Cooperative Work in Design). arXiv admin note: text overlap with arXiv:2309.09749
- Published
- 2024
19. New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation
- Author
-
Yuan, Ye, Zhang, Chen, Li, Fan, Chen, Jian, Fu, Yanning, Bai, Chunhai, Gao, Xing, Wang, Yong, Zhong, Tuhong, Gao, Yixing, Wang, Liang, Chen, Donghua, Zhang, Yixing, Zhang, Yang, Xie, Wenpeng, Zhang, Shupi, Liu, Ding, Cao, Jun, Yin, Xiangdong, Mo, Xiaojun, Liu, Jing, Han, Xinru, Liu, Tong, Chen, Yuqiang, Gao, Zhendong, Zeng, Xiang, Niu, Guihua, Zheng, Xiansheng, Lin, Yuchen, Ye, Peiyu, Liang, Weitang, Zhu, Chengcheng, Hu, Zhiqiang, He, Jianguo, Zhang, Wei, Chen, Yue, Cheng, Zhuo, Sun, Tianrui, Guo, Chenyang, Lu, Yue, Lin, Jiajun, Tan, Wei, Zhou, Jia, Xu, Jun, He, Jun, Ye, Jiahui, Li, Delai, Zhang, Shuai, and Qu, Qingyue
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{\mu bar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$., Comment: Astronomy & Astrophysics, in press. 9 pages, 2 figures, 3 tables
- Published
- 2024
- Full Text
- View/download PDF
20. How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance
- Author
-
Li, Hongkang, Zhang, Shuai, Zhang, Yihua, Wang, Meng, Liu, Sijia, and Chen, Pin-Yu
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.
- Published
- 2024
21. PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
- Author
-
Jiang, Wenqi, Zhang, Shuai, Han, Boran, Wang, Jie, Wang, Bernie, and Kraska, Tim
- Subjects
Computer Science - Computation and Language - Abstract
Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases. However, retrievals from large databases can constitute a substantial portion of the overall generation time, particularly when retrievals are periodically performed to align the retrieved content with the latest states of generation. In this paper, we introduce PipeRAG, a novel algorithm-system co-design approach to reduce generation latency and enhance generation quality. PipeRAG integrates (1) pipeline parallelism to enable concurrent retrieval and generation processes, (2) flexible retrieval intervals to maximize the efficiency of pipeline parallelism, and (3) a performance model to automatically balance retrieval quality and latency based on the generation states and underlying hardware. Our evaluation shows that, by combining the three aforementioned methods, PipeRAG achieves up to 2.6$\times$ speedup in end-to-end generation latency while improving generation quality. These promising results showcase the effectiveness of co-designing algorithms with underlying systems, paving the way for the adoption of PipeRAG in future RAG systems.
- Published
- 2024
22. Light-induced giant enhancement of nonreciprocal transport at KTaO3-based interfaces
- Author
-
Zhang, Xu, Zhu, Tongshuai, Zhang, Shuai, Chen, Zhongqiang, Song, Anke, Zhang, Chong, Gao, Rongzheng, Niu, Wei, Chen, Yequan, Fei, Fucong, Tai, Yilin, Li, Guoan, Ge, Binghui, Lou, Wenkai, Shen, Jie, Zhang, Haijun, Chang, Kai, Song, Fengqi, Zhang, Rong, and Wang, Xuefeng
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics - Abstract
Nonlinear transport is a unique functionality of noncentrosymmetric systems, which reflects profound physics, such as spin-orbit interaction, superconductivity and band geometry. However, it remains highly challenging to enhance the nonreciprocal transport for promising rectification devices. Here, we observe a light-induced giant enhancement of nonreciprocal transport at the superconducting and epitaxial CaZrO3/KTaO3 (111) interfaces. The nonreciprocal transport coefficient undergoes a giant increase with three orders of magnitude up to 105 A-1T-1. Furthermore, a strong Rashba spin-orbit coupling effective field of 14.7 T is achieved with abundant high-mobility photocarriers under ultraviolet illumination, which accounts for the giant enhancement of nonreciprocal transport coefficient. Our first-principles calculations further disclose the stronger Rashba spin-orbit coupling strength and the longer relaxation time in the photocarrier excitation process, bridging the light-property quantitative relationship. Our work provides an alternative pathway to boost nonreciprocal transport in noncentrosymmetric systems and facilitates the promising applications in opto-rectification devices and spin-orbitronic devices., Comment: 38 pages, 17 figures
- Published
- 2024
23. Optimization of Superhydrophobic Surface Preparation Using One-Step Immersion and Control Variate Method
- Author
-
Zhang, Shuai, primary, Tan, Yong Chai, additional, Che, Hui Xin, additional, Tai, Vin Cent, additional, Sia, Yaw Yoong, additional, Janasekaran, Shamini, additional, and Tayier, Walisijiang, additional
- Published
- 2023
- Full Text
- View/download PDF
24. Study on the Relief Response of Cross-Regional Natural Disaster Risk
- Author
-
Wang, Yang, primary, Ning, Baokun, additional, Chen, Zhuo, additional, Zhang, Shuai, additional, and Deng, Duo, additional
- Published
- 2023
- Full Text
- View/download PDF
25. Effect of Core Strength Training on Technical Skill Performance of Combat Sport Players: A Systematic Review
- Author
-
zhang, shuai, primary and Soh, Kim Geok, additional
- Published
- 2023
- Full Text
- View/download PDF
26. Chemical Rules for Stacked Kagome and Honeycomb Topological Semimetals
- Author
-
Zhou, Liqin, Yang, Fazhi, Zhang, Shuai, and Zhang, Tiantian
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
We study the chemical rules for predicting and understanding topological states in stacked kagome and honeycomb lattices in both analytical and numerical ways. Starting with a minimal five-band tight-binding model, we sort out all the topological states into five groups, which are determined by the interlayer and intralayer hopping parameters. Combined with the model, we design an algorithm to obtain a series of experimentally synthesized topological semimetals with kagome and honeycomb layers, i.e., IAMX family (IA = Alkali metal element, M = Rare earth metal element, X = Carbon group element), in the inorganic crystal structure database. A follow-up high-throughput calculation shows that IAMX family materials are all nodal-line semimetals and they will be Weyl semimetals after taking spin-orbit coupling into consideration. To have further insights into the topology of the IAMX family, a detailed chemical rule analysis is carried out on the high-throughput calculations, including the lattice constants of the structure, intralayer and interlayer couplings, bond strengths, electronegativity, and so on, which are consistent with our tight-binding model. Our study provides a way to discover and modulate topological properties in stacked kagome and honeycomb crystals and offers candidates for studying topology-related properties like topological superconductors and axion insulators.
- Published
- 2024
- Full Text
- View/download PDF
27. Automatic Evaluation for Mental Health Counseling using LLMs
- Author
-
Li, Anqi, Lu, Yu, Song, Nirui, Zhang, Shuai, Ma, Lizhi, and Lan, Zhenzhong
- Subjects
Computer Science - Computation and Language - Abstract
High-quality psychological counseling is crucial for mental health worldwide, and timely evaluation is vital for ensuring its effectiveness. However, obtaining professional evaluation for each counseling session is expensive and challenging. Existing methods that rely on self or third-party manual reports to assess the quality of counseling suffer from subjective biases and limitations of time-consuming. To address above challenges, this paper proposes an innovative and efficient automatic approach using large language models (LLMs) to evaluate the working alliance in counseling conversations. We collected a comprehensive counseling dataset and conducted multiple third-party evaluations based on therapeutic relationship theory. Our LLM-based evaluation, combined with our guidelines, shows high agreement with human evaluations and provides valuable insights into counseling scripts. This highlights the potential of LLMs as supervisory tools for psychotherapists. By integrating LLMs into the evaluation process, our approach offers a cost-effective and dependable means of assessing counseling quality, enhancing overall effectiveness., Comment: 21 pages, 4 figures
- Published
- 2024
28. Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents
- Author
-
Zhang, Shuai, Lu, Yu, Liu, Junwen, Yu, Jia, Qiu, Huachuan, Yan, Yuming, and Lan, Zhenzhong
- Subjects
Computer Science - Computation and Language - Abstract
With the growing humanlike nature of dialog agents, people are now engaging in extended conversations that can stretch from brief moments to substantial periods of time. Understanding the factors that contribute to sustaining these interactions is crucial, yet existing studies primarily focusing on short-term simulations that rarely explore such prolonged and real conversations. In this paper, we investigate the factors influencing retention rates in real interactions with roleplaying models. By analyzing a large dataset of interactions between real users and thousands of characters, we systematically examine multiple factors and assess their impact on user retention rate. Surprisingly, we find that the degree to which the bot embodies the roles it plays has limited influence on retention rates, while the length of each turn it speaks significantly affects retention rates. This study sheds light on the critical aspects of user engagement with role-playing models and provides valuable insights for future improvements in the development of large language models for role-playing purposes.
- Published
- 2024
29. Foundation Models for Recommender Systems: A Survey and New Perspectives
- Author
-
Huang, Chengkai, Yu, Tong, Xie, Kaige, Zhang, Shuai, Yao, Lina, and McAuley, Julian
- Subjects
Computer Science - Information Retrieval - Abstract
Recently, Foundation Models (FMs), with their extensive knowledge bases and complex architectures, have offered unique opportunities within the realm of recommender systems (RSs). In this paper, we attempt to thoroughly examine FM-based recommendation systems (FM4RecSys). We start by reviewing the research background of FM4RecSys. Then, we provide a systematic taxonomy of existing FM4RecSys research works, which can be divided into four different parts including data characteristics, representation learning, model type, and downstream tasks. Within each part, we review the key recent research developments, outlining the representative models and discussing their characteristics. Moreover, we elaborate on the open problems and opportunities of FM4RecSys aiming to shed light on future research directions in this area. In conclusion, we recap our findings and discuss the emerging trends in this field.
- Published
- 2024
30. MolTC: Towards Molecular Relational Modeling In Language Models
- Author
-
Fang, Junfeng, Zhang, Shuai, Wu, Chang, Yang, Zhengyi, Liu, Zhiyuan, Li, Sihang, Wang, Kun, Du, Wenjie, and Wang, Xiang
- Subjects
Quantitative Biology - Quantitative Methods ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information sharing. Moreover, to train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.
- Published
- 2024
31. Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems
- Author
-
Cai, Tianzhang, Wang, Qichen, Zhang, Shuai, Demir, Özlem Tuğfe, and Cavdar, Cicek
- Subjects
Computer Science - Information Theory ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentralized partially observable Markov decision process (DEC-POMDP) to enable collaboration between individual BSs, which is necessary to tackle inter-cell interference. A multi-agent proximal policy optimization (MAPPO) algorithm is designed to learn a collaborative BS control policy. To enhance its scalability, a modified version called MAPPO-neighbor policy is further proposed. Simulation results demonstrate that the trained MAPPO agent achieves better performance compared to baseline policies. Specifically, compared to the auto sleep mode 1 (symbol-level sleeping) algorithm, the MAPPO-neighbor policy reduces power consumption by approximately 8.7% during low-traffic hours and improves energy efficiency by approximately 19% during high-traffic hours, respectively.
- Published
- 2024
32. Two-dimensional silk
- Author
-
Shi, Chenyang, Zorman, Marlo, Zhao, Xiao, Salmeron, Miquel B., Pfaendtner, Jim, Liu, Xiang Yang, Zhang, Shuai, and De Yoreo, James
- Subjects
Condensed Matter - Materials Science - Abstract
The ability to form silk films on semiconductors, metals, and oxides or as free-standing membranes has motivated research into silk-based electronic, optical, and biomedical devices. However, the inherent disorder of native silk limits device performance. Here we report the creation of highly ordered two-dimensional (2D) silk fibroin (SF) layers on van der Waals solids. Using in situ atomic force microscopy, synchrotron-based infrared spectroscopy, and molecular dynamics simulations, we develop a mechanistic understanding of the assembly process. We show that the films consist of lamellae having an epitaxial relationship with the underlying lattice and that the SF molecules exhibit the same Beta-sheet secondary structure seen in the crystallites of the native form. By increasing the SF concentration, multilayer films form via layer-by-layer growth, either along a classical pathway in which SF molecules assemble directly into the lamellae or, at sufficiently high concentrations, along a two-step pathway beginning with formation of a disordered monolayer that subsequently converts into the crystalline phase. Kelvin probe measurements show that these 2D SF layers substantially alter the surface potential. Moreover, the ability to assemble 2D silk on both graphite and MoS2 suggests that it may provide a general platform for silk-based electronics on vdW solids.
- Published
- 2024
33. Pressure waves from air gun bubbles: a numerical analysis based on Finite Volume Method
- Author
-
Wang, Shi-Ping, Geng, Hang, Zhang, Shuai, and Wang, Si-Wei
- Subjects
Physics - Fluid Dynamics ,I.6.6 - Abstract
The pressure wave emitted from the air gun contains many frequencies, among which the low-frequency waves are desirable for exploration and imaging, while the high-frequency waves need to be suppressed as they are harmful to marine species. The high-frequency waves originate from the fast oscillations of the flow during the release of the air, such as the impingement of the gas jet into the liquid, the expansion of the air gun bubble, and the interaction between the air gun body and the bubble. However, those dynamic and the emitted waves are adjustable by the special design of the air guns. To analyze the underlying relations, we present a numerical study with a compressible air gun bubble model using the volume of fluid (VOF) approach combined with the finite volume method (FVM) implemented in STAR-CCM+. The venting process of an air gun is investigated to reveal the influence of the air gun body. The results show that air gun pressure for the far field is mainly proportional to the expansion acceleration of the whole gas. Our results also indicate that the opening and chamber shape of the air gun affects the gas expansion acceleration, which influences the first peak of the pressure wave significantly. The larger the opening is, the faster the gas is released, the greater the amplitude of the first peak is. The larger the chamber length/diameter ratio, the slower the gas is released and the lower the amplitude of the first peak., Comment: 21 pages, 15 figures
- Published
- 2024
- Full Text
- View/download PDF
34. CaMML: Context-Aware Multimodal Learner for Large Models
- Author
-
Chen, Yixin, Zhang, Shuai, Han, Boran, He, Tong, and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we introduce Context-Aware MultiModal Learner (CaMML), for tuning large multimodal models (LMMs). CaMML, a lightweight module, is crafted to seamlessly integrate multimodal contextual samples into large models, thereby empowering the model to derive knowledge from analogous, domain-specific, up-to-date information and make grounded inferences. Importantly, CaMML is highly scalable and can efficiently handle lengthy multimodal context examples owing to its hierarchical design. Based on CaMML, we have developed two multimodal models, CaMML-7B and CaMML-13B, that have shown exceptional performance across an array of benchmark datasets for multimodal tasks. Remarkably, CaMML-13B achieves the state-of-the-art performance on over ten widely recognized multimodal benchmark datasets, surpassing LLaVA-1.5 (13B) with a noticeable margin, without integration of any external resources. Moreover, we have conducted extensive ablative studies to inspect the inner workings of CaMML and performed qualitative analyses to showcase its effectiveness in handling real-world challenging cases., Comment: Preprint
- Published
- 2024
35. Observation of a 1/3 Magnetisation Plateau Phase as Evidence for the Kitaev Interaction in a Honeycomb-Lattice Antiferromagnet
- Author
-
Shangguan, Yanyan, Bao, Song, Dong, Zhao-Yang, Xi, Ning, Gao, Yi-Peng, Ma, Zhen, Wang, Wei, Qi, Zhongyuan, Zhang, Shuai, Huang, Zhentao, Liao, Junbo, Zhao, Xiaoxue, Zhang, Bo, Cheng, Shufan, Xu, Hao, Yu, Dehong, Mole, Richard A., Murai, Naoki, Ohira-Kawamura, Seiko, He, Lunhua, Hao, Jiazheng, Yan, Qing-Bo, Song, Fengqi, Li, Wei, Yu, Shun-Li, Li, Jian-Xin, and Wen, Jinsheng
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Superconductivity - Abstract
Fractional magnetisation plateaus, in which the magnetisation is pinned at a fraction of its saturated value within a range of external magnetic field, are spectacular macroscopic manifestations of the collective quantum behaviours. One prominent example of the plateau phase is found in spin-1/2 triangular-lattice antiferromagnets featuring strong geometrical frustration, and is often interpreted as quantum-fluctuation-stabilised state in magnetic field via the "order-by-disorder" mechanism. Here, we observe an unprecedented 1/3 magnetisation plateau between 5.2 and 7.4 T at 2 K in a spin-1 antiferromagnet Na$_3$Ni$_2$BiO$_6$ with a honeycomb lattice, where conventionally no geometrical frustration is anticipated. By carrying out elastic neutron scattering measurements, we propose the spin structure of the plateau phase to be an unusual partial spin-flop ferrimagnetic order, transitioning from the zigzag antiferromagnetic order in zero field. Our theoretical calculations show that the plateau phase is stabilised by the bond-anisotropic Kitaev interaction. These results provide a new paradigm for the exploration of rich quantum phases in frustrated magnets and exotic Kitaev physics in high-spin systems., Comment: Submitted version, 10 pages, 5 figures. Final version has been published in Nature Physics
- Published
- 2023
- Full Text
- View/download PDF
36. A dynamical clipping approach with task feedback for Proximal Policy Optimization
- Author
-
Zhang, Ziqi, Xu, Jingzehua, Zhuang, Zifeng, Liu, Jinxin, wang, Donglin, and Zhang, Shuai
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Proximal Policy Optimization (PPO) has been broadly applied to various domains, including Large Language Model (LLM) optimization and Robotics learning, etc. However, PPO is limited by a fixed setting for the clipping bound. Specifically, there is no theoretical proof that the optimal clipping bound remains consistent throughout the entire training process. Truncating the ratio of the new and old policies with a unique clipping bound ensures stable training and can achieve the best training performance. Additionally, previous research suggests that a fixed clipping bound limits the agent's exploration. Therefore, researching a dynamical clipping bound to enhance PPO's performance can be highly beneficial. Different from previous clipping approaches, we consider increasing the maximum cumulative Return in reinforcement learning (RL) tasks as the preference of the RL task, and propose a bi-level proximal policy optimization paradigm, which involves not only optimizing the policy but also dynamically adjusting the clipping bound to reflect the preference of the RL tasks to further elevate the training outcomes and stability of PPO. Based on this bi-level proximal policy optimization paradigm, we introduce a new algorithm named Preference based Proximal Policy Optimization (Pb-PPO). This algorithm utilizes a multi-armed bandit algorithm to reflect RL preferences (we also validate that such approach can be utilized to reflect human preference), recommending the optimal clipping bound for PPO in each epoch, thereby achieving more stable and better training outcomes.
- Published
- 2023
37. Identification of Carbon Stars from LAMOST DR7
- Author
-
Li, Linlin, Zhang, Kecheng, Cui, Wenyuan, Shi, Jianrong, Ji, Wei, Huo, Zhenyan, Gao, Yawei, Zhang, Shuai, and Sun, Mingxu
- Subjects
Astrophysics - Astrophysics of Galaxies ,Astrophysics - Solar and Stellar Astrophysics - Abstract
Carbon stars are excellent kinematic tracers of galaxies and play important roles in understanding the evolution of the Galaxy. Therefore, it is worthwhile to search for them in a large amount of spectra. In this work, we build a new carbon star catalog based on the LAMOST DR7 spectra. The catalog contains 4542 spectra of 3546 carbon stars, identified through line index and near-infrared color-color diagrams. Through visual inspection of the spectra, we further subclassify them into 925 C--H, 384 C--R, 608 C--N, and 1292 Ba stars. However, 437 stars could not be sub-classified due to their low signal-to-noise. Moreover, by comparing with LAMOST DR7 pipeline we find 567 more carbon stars and visually sub-classify them. We find that on the $J-H$ vs. $H-K_{\rm s}$ two-color diagram, C--N stars can be reliably distinguished from the other three sub-types. Additionally, by utilizing the Gaia distance, we study the distribution of carbon stars in the H-R diagram and identify 258 dwarf carbon stars by the criterion $M_{\rm G}>$5.0\,mag. Finally, we present the spatial distribution in Galactic coordinates of the 3546 carbon stars. The majority of C-N, C-R, and Ba stars are distributed at low Galactic latitudes, while most C--H and dC stars distribute at high Galactic latitudes.
- Published
- 2023
38. Characterizing the COVID-19 Infodemic on Chinese Social Media: Exploratory Study
- Author
-
Zhang, Shuai, Pian, Wenjing, Ma, Feicheng, Ni, Zhenni, and Liu, Yunmei
- Subjects
Public aspects of medicine ,RA1-1270 - Abstract
BackgroundThe COVID-19 infodemic has been disseminating rapidly on social media and posing a significant threat to people’s health and governance systems. ObjectiveThis study aimed to investigate and analyze posts related to COVID-19 misinformation on major Chinese social media platforms in order to characterize the COVID-19 infodemic. MethodsWe collected posts related to COVID-19 misinformation published on major Chinese social media platforms from January 20 to May 28, 2020, by using PythonToolkit. We used content analysis to identify the quantity and source of prevalent posts and topic modeling to cluster themes related to the COVID-19 infodemic. Furthermore, we explored the quantity, sources, and theme characteristics of the COVID-19 infodemic over time. ResultsThe daily number of social media posts related to the COVID-19 infodemic was positively correlated with the daily number of newly confirmed (r=0.672, P
- Published
- 2021
- Full Text
- View/download PDF
39. Optimize the event selection strategy to study the anomalous quartic gauge couplings at muon colliders using the support vector machine and quantum support vector machine
- Author
-
Zhang, Shuai, Guo, Yu-Chen, and Yang, Ji-Chong
- Subjects
High Energy Physics - Phenomenology - Abstract
The search of the new physics~(NP) beyond the Standard Model is one of the most important topics in current high energy physics. With the increasing luminosities at the colliders, the search for NP signals requires the analysis of more and more data, and the efficiency in data processing becomes particularly important. As a machine learning algorithm, support vector machine~(SVM) is expected to to be useful in the search of NP. Meanwhile, the quantum computing has the potential to offer huge advantages when dealing with large amounts of data, which suggests that quantum SVM~(QSVM) is a potential tool in future phenomenological studies of the NP. How to use SVM and QSVM to optimize event selection strategies to search for NP signals are studied in this paper. Taking the tri-photon process at a muon collider as an example, it can be shown that the event selection strategies optimized by the SVM and QSVM are effective in the search of the dimension-8 operators contributing to the anomalous quartic gauge couplings., Comment: 41 pages, 8 figures
- Published
- 2023
40. Electronic interactions in Dirac fluids visualized by nano-terahertz spacetime mapping
- Author
-
Xu, Suheng, Li, Yutao, Vitalone, Rocco A., Jing, Ran, Sternbach, Aaron J., Zhang, Shuai, Ingham, Julian, Delor, Milan, McIver, James. W., Yankowitz, Matthew, Queiroz, Raquel, Millis, Andrew J., Fogler, Michael M., Dean, Cory R., Hone, James, Liu, Mengkun, and Basov, D. N.
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Optics - Abstract
Ultraclean graphene at charge neutrality hosts a quantum critical Dirac fluid of interacting electrons and holes. Interactions profoundly affect the charge dynamics of graphene, which is encoded in the properties of its collective modes: surface plasmon polaritons (SPPs). The group velocity and lifetime of SPPs have a direct correspondence with the reactive and dissipative parts of the tera-Hertz (THz) conductivity of the Dirac fluid. We succeeded in tracking the propagation of SPPs over sub-micron distances at femto-second (fs) time scales. Our experiments uncovered prominent departures from the predictions of the conventional Fermi-liquid theory. The deviations are particularly strong when the densities of electrons and holes are approximately equal. Our imaging methodology can be used to probe the electromagnetics of quantum materials other than graphene in order to provide fs-scale diagnostics under near-equilibrium conditions.
- Published
- 2023
41. Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention
- Author
-
Zhang, Shuai and Xie, Minghong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally, we introduce a Pooling Attention Module (PAM) at various stages of the encoder. This module serves to amplify the features extracted by the network and integrates the module's output into the decoder in a targeted manner, significantly improving semantic segmentation performance. Our experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYUDv2 and SUN-RGBD, underscoring its effectiveness in enhancing RGB-D semantic segmentation.
- Published
- 2023
42. Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design
- Author
-
Yu, Jia, Zhang, Lichao, Chen, Zijie, Pan, Fayu, Wen, MiaoMiao, Yan, Yuming, Weng, Fangsheng, Zhang, Shuai, Pan, Lili, and Lan, Zhenzhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.
- Published
- 2023
43. PsyBench: a balanced and in-depth Psychological Chinese Evaluation Benchmark for Foundation Models
- Author
-
Zhang, Junlei, He, Hongliang, Song, Nirui, He, Shuyuan, Zhang, Shuai, Qiu, Huachuan, Li, Anqi, Ma, Lizhi, and Lan, Zhenzhong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
As Large Language Models (LLMs) are becoming prevalent in various fields, there is an urgent need for improved NLP benchmarks that encompass all the necessary knowledge of individual discipline. Many contemporary benchmarks for foundational models emphasize a broad range of subjects but often fall short in presenting all the critical subjects and encompassing necessary professional knowledge of them. This shortfall has led to skewed results, given that LLMs exhibit varying performance across different subjects and knowledge areas. To address this issue, we present psybench, the first comprehensive Chinese evaluation suite that covers all the necessary knowledge required for graduate entrance exams. psybench offers a deep evaluation of a model's strengths and weaknesses in psychology through multiple-choice questions. Our findings show significant differences in performance across different sections of a subject, highlighting the risk of skewed results when the knowledge in test sets is not balanced. Notably, only the ChatGPT model reaches an average accuracy above $70\%$, indicating that there is still plenty of room for improvement. We expect that psybench will help to conduct thorough evaluations of base models' strengths and weaknesses and assist in practical application in the field of psychology.
- Published
- 2023
44. Efficient Beam Manipulation with Phase Symmetry Operations on Modulated Metasurfaces
- Author
-
Cai, Yang, Mei, Peng, Pedersen, Gert Frølund, and Zhang, Shuai
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
Beam manipulation is of paramount importance in wave engineering, enabling diverse beam shapes like pencil beams, flat-top beams, and isoflux beams to cater to various application missions. Among the beams, shaping flat-top and isoflux beams remains challenging with the traditional synthesis approaches that mainly rely on optimization algorithms. Here, we develop modulated metasurfaces to efficiently generate flat-top and isoflux beams from the first principle of field superposition with negligible optimizations, by performing phase symmetry operations. The theoretical analysis not only facilitates the shaping of 1D and 2D flat-top and isoflux beams but also exhibits controllable beamwidths. Experimental validation confirms the efficacy of the phase symmetry operations in generating flat-top beams with adjustable beamwidths. The concept of phase symmetry operations can be extended to other vectorial components, offering potential applications for the manipulation of various wave types such as acoustic waves, water surface waves, and beyond, thereby advancing related applications.
- Published
- 2023
45. Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
- Author
-
Lv, Changsheng, Zhang, Shuai, Tian, Yapeng, Qi, Mengshi, and Ma, Huadong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL., Comment: To be published in 37th Conference on Neural Information Processing Systems
- Published
- 2023
46. On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration
- Author
-
Zhang, Shuai, Li, Hongkang, Wang, Meng, Liu, Miao, Chen, Pin-Yu, Lu, Songtao, Liu, Sijia, Murugesan, Keerthiram, and Chaudhury, Subhajit
- Subjects
Computer Science - Machine Learning - Abstract
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.
- Published
- 2023
47. Lightweight In-Context Tuning for Multimodal Unified Models
- Author
-
Chen, Yixin, Zhang, Shuai, Han, Boran, and Jia, Jiaya
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In-context learning (ICL) involves reasoning from given contextual examples. As more modalities comes, this procedure is becoming more challenging as the interleaved input modalities convolutes the understanding process. This is exemplified by the observation that multimodal models often struggle to effectively extrapolate from contextual examples to perform ICL. To address these challenges, we introduce MultiModal In-conteXt Tuning (M$^2$IXT), a lightweight module to enhance the ICL capabilities of multimodal unified models. The proposed M$^2$IXT module perceives an expandable context window to incorporate various labeled examples of multiple modalities (e.g., text, image, and coordinates). It can be prepended to various multimodal unified models (e.g., OFA, Unival, LLaVA) of different architectures and trained via a mixed-tasks strategy to enable rapid few-shot adaption on multiple tasks and datasets. When tuned on as little as 50K multimodal data, M$^2$IXT can boost the few-shot ICL performance significantly (e.g., 18\% relative increase for OFA), and obtained state-of-the-art results across an array of tasks including visual question answering, image captioning, visual grounding, and visual entailment, while being considerably small in terms of model parameters (e.g., $\sim$$20\times$ smaller than Flamingo or MMICL), highlighting the flexibility and effectiveness of M$^2$IXT as a multimodal in-context learner., Comment: Preprint
- Published
- 2023
48. Offline Imitation Learning with Variational Counterfactual Reasoning
- Author
-
He, Bowei, Sun, Zexu, Liu, Jinxin, Zhang, Shuai, Chen, Xu, and Ma, Chen
- Subjects
Computer Science - Machine Learning - Abstract
In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}., Comment: Published on NeurIPS2023
- Published
- 2023
49. Aerial Base Stations: Practical Considerations for Power Consumption and Service Time
- Author
-
Seeram, Siva Satya Sri Ganesh, Zhang, Shuai, Ozger, Mustafa, Grabs, Andre, Holis, Jaroslav, and Cavdar, Cicek
- Subjects
Computer Science - Networking and Internet Architecture ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Aerial base stations (ABSs) have emerged as a promising solution to meet the high traffic demands of future wireless networks. Nevertheless, their practical implementation requires efficient utilization of limited payload and onboard energy. Understanding the power consumption streams, such as mechanical and communication power, and their relationship to the payload is crucial for analyzing its feasibility. Specifically, we focus on rotary-wing drones (RWDs), fixed-wing drones (FWDs), and high-altitude platforms (HAPs), analyzing their energy consumption models and key performance metrics such as power consumption, energy harvested-to-consumption ratio, and service time with varying wingspans, battery capacities, and regions. Our findings indicate that FWDs have longer service times and HAPs have energy harvested-to-consumption ratios greater than one, indicating theoretically infinite service time, especially when deployed in near-equator regions or have a large wingspan. Additionally, we investigate the case study of RWD-BS deployment, assessing aerial network dimensioning aspects such as ABS coverage radius based on altitude, environment, and frequency of operation. Our findings provide valuable insights for researchers and telecom operators, facilitating effective cost planning by determining the number of ABSs and backup batteries required for uninterrupted operations., Comment: 6 pages, 3 figures, 5 tables, conference
- Published
- 2023
50. Reconciling results of 2019 and 2020 stellar occultations on Pluto's atmosphere. New constraints from both the 5 September 2019 event and consistency analysis
- Author
-
Yuan, Ye, Li, Fan, Fu, Yanning, Chen, Jian, Tan, Wei, Zhang, Shuai, Zhang, Wei, Zhang, Chen, Zhang, Qiang, Ye, Jiahui, Li, Delai, Zhu, Yijing, Fu, Zhensen, Zhu, Ansheng, Chen, Yue, Xu, Jun, and Zhang, Yang
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
A stellar occultation by Pluto on 5 September 2019 yielded positive detections at two separate stations. Using an approach consistent with comparable studies, we derived a surface pressure of $11.478 \pm 0.55~\mathrm{\mu bar}$ for Pluto's atmosphere from the observations of this event. In addition, to avoid potential method inconsistancies highlighted by Sicardy et al. when comparing with historical pressure measurements, we reanalyzed the data by 15 August 2018 and 17 July 2019 events, respectively. All the new measurements provide a bridge between the two different perspectives on the pressure variation since 2015: a rapid pressure drop from previous studies of the 15 August 2018 and 17 July 2019 events and a plateau phase from that of the 6 June 2020 event. The pressure measurement from the 5 September 2019 event aligns with those from 2016, 2018, and 2020, supporting the latter perspective. While the measurements from the 4 June 2011 and 17 July 2019 events suggest probable V-shaped pressure variations unaccounted for by the volatile transport model (VTM) from Meza et al., the VTM remains applicable on average. And, the validity of the V-shaped variations is debatable due to the stellar faintness of the 4 June 2011 event and the grazing single-chord geometry of the 17 July 2019 event. To reveal and understand all significant pressure variations of Pluto's atmosphere, it is essential to provide constraints on both short-term and long-term evolutions of the interacting atmosphere and surface by continuous pressure monitoring through occultation observations, whenever possible, complemented by frequent spectroscopy and photometry of the surface., Comment: Astronomy & Astrophysics, in press. 10 pages, 6 figures
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.